Goto

Collaborating Authors

 prediction sensitivity



Robustness is Important: Limitations of LLMs for Data Fitting

Liu, Hejia, Yang, Mochen, Adomavicius, Gediminas

arXiv.org Machine Learning

Large Language Models (LLMs) are being applied in a wide array of settings, well beyond the typical language-oriented use cases. In particular, LLMs are increasingly used as a plug-and-play method for fitting data and generating predictions. Prior work has shown that LLMs, via in-context learning or supervised fine-tuning, can perform competitively with many tabular supervised learning techniques in terms of predictive performance. However, we identify a critical vulnerability of using LLMs for data fitting -- making changes to data representation that are completely irrelevant to the underlying learning task can drastically alter LLMs' predictions on the same data. For example, simply changing variable names can sway the size of prediction error by as much as 82% in certain settings. Such prediction sensitivity with respect to task-irrelevant variations manifests under both in-context learning and supervised fine-tuning, for both close-weight and open-weight general-purpose LLMs. Moreover, by examining the attention scores of an open-weight LLM, we discover a non-uniform attention pattern: training examples and variable names/values which happen to occupy certain positions in the prompt receive more attention when output tokens are generated, even though different positions are expected to receive roughly the same attention. This partially explains the sensitivity in the presence of task-irrelevant variations. We also consider a state-of-the-art tabular foundation model (TabPFN) trained specifically for data fitting. Despite being explicitly designed to achieve prediction robustness, TabPFN is still not immune to task-irrelevant variations. Overall, despite LLMs' impressive predictive capabilities, currently they lack even the basic level of robustness to be used as a principled data-fitting tool.



Model Interpretation and Explainability: Towards Creating Transparency in Prediction Models

Kridel, Donald, Dineen, Jacob, Dolk, Daniel, Castillo, David

arXiv.org Artificial Intelligence

Model explainability and interpretability are now Explainable AI (XAI) has a counterpart in analytical being perceived as desirable, if not required, features modeling which we refer to as model explainability. of data science and predictive analytics overall. Our We tackle the issue of model explainability in the objective here is to examine what these features may context of prediction models. We analyze a dataset of look like when applied to previous research we have loans from a credit card company using the following conducted in the area of econometric prediction and three steps: execute and compare four different predictive analytics [10]. We consider the domain of prediction methods, apply the best known Lending Club loan applications. For our dataset, we explainability techniques in the current literature to perform three different analyses: the model training sets to identify feature importance 1. Model Execution and Comparison. Run and (FI) (static case), and finally to cross-check whether compare four different prediction models on the the FI set holds up under "what if" prediction


Uncertainty Aware Neural Network from Similarity and Sensitivity

Kabir, H M Dipu, Mondal, Subrota Kumar, Khanam, Sadia, Khosravi, Abbas, Rahman, Shafin, Qazani, Mohammad Reza Chalak, Alizadehsani, Roohallah, Asadi, Houshyar, Mohamed, Shady, Nahavandi, Saeid, Acharya, U Rajendra

arXiv.org Artificial Intelligence

Researchers have proposed several approaches for neural network (NN) based uncertainty quantification (UQ). However, most of the approaches are developed considering strong assumptions. Uncertainty quantification algorithms often perform poorly in an input domain and the reason for poor performance remains unknown. Therefore, we present a neural network training method that considers similar samples with sensitivity awareness in this paper. In the proposed NN training method for UQ, first, we train a shallow NN for the point prediction. Then, we compute the absolute differences between prediction and targets and train another NN for predicting those absolute differences or absolute errors. Domains with high average absolute errors represent a high uncertainty. In the next step, we select each sample in the training set one by one and compute both prediction and error sensitivities. Then we select similar samples with sensitivity consideration and save indexes of similar samples. The ranges of an input parameter become narrower when the output is highly sensitive to that parameter. After that, we construct initial uncertainty bounds (UB) by considering the distribution of sensitivity aware similar samples. Prediction intervals (PIs) from initial uncertainty bounds are larger and cover more samples than required. Therefore, we train bound correction NN. As following all the steps for finding UB for each sample requires a lot of computation and memory access, we train a UB computation NN. The UB computation NN takes an input sample and provides an uncertainty bound. The UB computation NN is the final product of the proposed approach. Scripts of the proposed method are available in the following GitHub repository: github.com/dipuk0506/UQ


On the Relation between Sensitivity and Accuracy in In-context Learning

Chen, Yanda, Zhao, Chen, Yu, Zhou, McKeown, Kathleen, He, He

arXiv.org Artificial Intelligence

In-context learning (ICL) suffers from oversensitivity to the prompt, making it unreliable in real-world scenarios. We study the sensitivity of ICL with respect to multiple perturbation types. First, we find that label bias obscures the true sensitivity, and therefore prior work may have significantly underestimated ICL sensitivity. Second, we observe a strong negative correlation between ICL sensitivity and accuracy: predictions sensitive to perturbations are less likely to be correct. Motivated by these findings, we propose \textsc{SenSel}, a few-shot selective prediction method that abstains from sensitive predictions. Experiments on ten classification datasets show that \textsc{SenSel} consistently outperforms two commonly used confidence-based and entropy-based baselines on abstention decisions.


Prediction Sensitivity: Continual Audit of Counterfactual Fairness in Deployed Classifiers

Maughan, Krystal, Ngong, Ivoline C., Near, Joseph P.

arXiv.org Artificial Intelligence

As AI-based systems increasingly impact many areas of our lives, auditing these systems for fairness is an increasingly high-stakes problem. Traditional group fairness metrics can miss discrimination against individuals and are difficult to apply after deployment. Counterfactual fairness describes an individualized notion of fairness but is even more challenging to evaluate after deployment. We present prediction sensitivity, an approach for continual audit of counterfactual fairness in deployed classifiers. Prediction sensitivity helps answer the question: would this prediction have been different, if this individual had belonged to a different demographic group -- for every prediction made by the deployed model. Prediction sensitivity can leverage correlations between protected status and other features and does not require protected status information at prediction time. Our empirical results demonstrate that prediction sensitivity is effective for detecting violations of counterfactual fairness.


Towards Auditability for Fairness in Deep Learning

Ngong, Ivoline C., Maughan, Krystal, Near, Joseph P.

arXiv.org Artificial Intelligence

Group fairness metrics can detect when a deep learning model behaves differently for advantaged and disadvantaged groups, but even models that score well on these metrics can make blatantly unfair predictions. We present smooth prediction sensitivity, an efficiently computed measure of individual fairness for deep learning models that is inspired by ideas from interpretability in deep learning. smooth prediction sensitivity allows individual predictions to be audited for fairness. We present preliminary experimental results suggesting that smooth prediction sensitivity can help distinguish between fair and unfair predictions, and that it may be helpful in detecting blatantly unfair predictions from "group-fair" models.


BayReL: Bayesian Relational Learning for Multi-omics Data Integration

Hajiramezanali, Ehsan, Hasanzadeh, Arman, Duffield, Nick, Narayanan, Krishna R, Qian, Xiaoning

arXiv.org Machine Learning

High-throughput molecular profiling technologies have produced high-dimensional multi-omics data, enabling systematic understanding of living systems at the genome scale. Studying molecular interactions across different data types helps reveal signal transduction mechanisms across different classes of molecules. In this paper, we develop a novel Bayesian representation learning method that infers the relational interactions across multi-omics data types. Our method, Bayesian Relational Learning (BayReL) for multi-omics data integration, takes advantage of a priori known relationships among the same class of molecules, modeled as a graph at each corresponding view, to learn view-specific latent variables as well as a multi-partite graph that encodes the interactions across views. Our experiments on several real-world datasets demonstrate enhanced performance of BayReL in inferring meaningful interactions compared to existing baselines.


Towards a Measure of Individual Fairness for Deep Learning

Maughan, Krystal, Near, Joseph P.

arXiv.org Artificial Intelligence

Deep learning has produced big advances in artificial intelligence, but trained neural networks often reflect and amplify bias in their training data, and thus produce unfair predictions. We propose a novel measure of individual fairness, called prediction sensitivity, that approximates the extent to which a particular prediction is dependent on a protected attribute. We show how to compute prediction sensitivity using standard automatic differentiation capabilities present in modern deep learning frameworks, and present preliminary empirical results suggesting that prediction sensitivity may be effective for measuring bias in individual predictions.